229 research outputs found

    Robust optimization of SVM hyperparameters in the classification of bioactive compounds

    Get PDF
    Background: Support Vector Machine has become one of the most popular machine learning tools used in vir - tual screening campaigns aimed at finding new drug candidates. Although it can be extremely effective in finding new potentially active compounds, its application requires the optimization of the hyperparameters with which the assessment is being run, particularly the C and γ values. The optimization requirement in turn, establishes the need to develop fast and effective approaches to the optimization procedure, providing the best predictive power of the constructed model. Results: In this study, we investigated the Bayesian and random search optimization of Support Vector Machine hyperparameters for classifying bioactive compounds. The effectiveness of these strategies was compared with the most popular optimization procedures—grid search and heuristic choice. We demonstrated that Bayesian optimiza- tion not only provides better, more efficient classification but is also much faster—the number of iterations it required for reaching optimal predictive performance was the lowest out of the all tested optimization methods. Moreover, for the Bayesian approach, the choice of parameters in subsequent iterations is directed and justified; therefore, the results obtained by using it are constantly improved and the range of hyperparameters tested provides the best over - all performance of Support Vector Machine. Additionally, we showed that a random search optimization of hyperpa- rameters leads to significantly better performance than grid search and heuristic-based approaches. Conclusions: The Bayesian approach to the optimization of Support Vector Machine parameters was demonstrated to outperform other optimization methods for tasks concerned with the bioactivity assessment of chemical com- pounds. This strategy not only provides a higher accuracy of classification, but is also much faster and more directed than other approaches for optimization. It appears that, despite its simplicity, random search optimization strategy should be used as a second choice if Bayesian approach application is not feasible

    The influence of negative training set size on machine learning-based virtual screening

    Get PDF
    BACKGROUND: The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods. RESULTS: The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. An increase in the ratio of positive to negative training instances was found to greatly influence most of the investigated evaluating parameters of ML methods in simulated virtual screening experiments. In a majority of cases, substantial increases in precision and MCC were observed in conjunction with some decreases in hit recall. The analysis of dynamics of those variations let us recommend an optimal composition of training data. The study was performed on several protein targets, 5 machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest) and 2 types of molecular fingerprints (MACCS and CDK FP). The most effective classification was provided by the combination of CDK FP with SMO or Random Forest algorithms. The Naïve Bayes models appeared to be hardly sensitive to changes in the number of negative instances in the training set. CONCLUSIONS: In conclusion, the ratio of positive to negative training instances should be taken into account during the preparation of machine learning experiments, as it might significantly influence the performance of particular classifier. What is more, the optimization of negative training set size can be applied as a boosting-like approach in machine learning-based virtual screening

    The influence of the inactives subset generation on the performance of machine learning methods

    Get PDF
    Background: A growing popularity of machine learning methods application in virtual screening, in both classification and regression tasks, can be observed in the past few years. However, their effectiveness is strongly dependent on many different factors. Results: In this study, the influence of the way of forming the set of inactives on the classification process was examined: random and diverse selection from the ZINC database, MDDR database and libraries generated according to the DUD methodology. All learning methods were tested in two modes: using one test set, the same for each method of inactive molecules generation and using test sets with inactives prepared in an analogous way as for training. The experiments were carried out for 5 different protein targets, 3 fingerprints for molecules representation and 7 classification algorithms with varying parameters. It appeared that the process of inactive set formation had a substantial impact on the machine learning methods performance. Conclusions: The level of chemical space limitation determined the ability of tested classifiers to select potentially active molecules in virtual screening tasks, as for example DUDs (widely applied in docking experiments) did not provide proper selection of active molecules from databases with diverse structures. The study clearly showed that inactive compounds forming training set should be representative to the highest possible extent for libraries that undergo screening

    Multiple conformational states in retrospective virtual screening : homology models vs. crystal structures : beta-2 adrenergic receptor case study

    Get PDF
    Background: Distinguishing active from inactive compounds is one of the crucial problems of molecular docking, especially in the context of virtual screening experiments. The randomization of poses and the natural flexibility of the protein make this discrimination even harder. Some of the recent approaches to post-docking analysis use an ensemble of receptor models to mimic this naturally occurring conformational diversity. However, the optimal number of receptor conformations is yet to be determined. In this study, we compare the results of a retrospective screening of beta-2 adrenergic receptor ligands performed on both the ensemble of receptor conformations extracted from ten available crystal structures and an equal number of homology models. Additional analysis was also performed for homology models with up to 20 receptor conformations considered. Results: The docking results were encoded into the Structural Interaction Fingerprints and were automatically analyzed by support vector machine. The use of homology models in such virtual screening application was proved to be superior in comparison to crystal structures. Additionally, increasing the number of receptor conformational states led to enhanced effectiveness of active vs. inactive compounds discrimination. Conclusions: For virtual screening purposes, the use of homology models was found to be most beneficial, even in the presence of crystallographic data regarding the conformational space of the receptor. The results also showed that increasing the number of receptors considered improves the effectiveness of identifying active compounds by machine learning method

    Structural determinants influencing halogen bonding : a case study on azinesulfonamide analogs of aripiprazole as 5-HT1A, 5-HT7, and D2 receptor ligands

    Get PDF
    This work was supported by Grant KNW-1-015/K/7/O from Medical University of Silesia, Katowice, Poland. Calculations have been carried out using resources provided by Wroclaw Centre for Networking and Supercomputing (http://wcss.pl), Grant No. 382.A series of azinesulfonamide derivatives of long-chain arylpiperazines with variable-length alkylene spacers between sulfonamide and 4-arylpiperazine moiety is designed, synthesized, and biologically evaluated. In vitro methods are used to determine their affinity for serotonin 5-HT1A, 5-HT6, 5-HT7, and dopamine D2 receptors. X-ray analysis, two-dimensional NMR conformational studies, and docking into the 5-HT1A and 5-HT7 receptor models are then conducted to investigate the conformational preferences of selected serotonin receptor ligands in different environments. The bent conformation of tetramethylene derivatives is found in a solid state, in dimethyl sulfoxide, and as a global energy minimum during conformational analysis in a simulated water environment. Furthermore, ligand geometry in top-scored complexes is also bent, with one torsion angle in the spacer (τ2) in synclinal conformation. Molecular docking studies indicate the role of halogen bonding in complexes of the most potent ligands and target receptors.[SU

    Fingerprint-Based Machine Learning Approach to Identify Potent and Selective 5-HT2BR Ligands

    Get PDF
    The identification of subtype-selective GPCR (G-protein coupled receptor) ligands is a challenging task. In this study, we developed a computational protocol to find compounds with 5-HT2BR versus 5-HT1BR selectivity. Our approach employs the hierarchical combination of machine learning methods, docking, and multiple scoring methods. First, we applied machine learning tools to filter a large database of druglike compounds by the new Neighbouring Substructures Fingerprint (NSFP). This two-dimensional fingerprint contains information on the connectivity of the substructural features of a compound. Preselected subsets of the database were then subjected to docking calculations. The main indicators of compounds’ selectivity were their different interactions with the secondary binding pockets of both target proteins, while binding modes within the orthosteric binding pocket were preserved. The combined methodology of ligand-based and structure-based methods was validated prospectively, resulting in the identification of hits with nanomolar affinity and ten-fold to ten thousand-fold selectivitiesÁ.A.K. and G.M.K. were supported by the National Brain Research Program (2017-1.2.1-NKP-2017-00002). K.R. is grateful for the ETIUDA scholarship of the National Science Center, Poland. J.B. and M.I.L. are grateful for the support from the Spanish Ministerio de Economía y Comptetitividad (SAF2017-85225-C3-1-R)S
    corecore